Chapter 3 UMAP Projection of 150 Singular Vectors

# svd_ump = umap(svd$v)
# save(svd_ump, file='svd_ump.RData')
load('svd_ump.RData')

fig <- plot_ly(type = 'scatter', mode = 'markers')
fig <- fig %>%
  add_trace(
    x = svd_ump$layout[,1],
    y = svd_ump$layout[,2],
    text = ~paste('heading:', head ,"$<br>text: ", raw_text  ),
    hoverinfo = 'text',
    marker = list(color='green'),
    showlegend = F
  )

fig

Outliers causing annoying viz issues requiring the zoom. We will routinely omit these outliers (after noting they make nice clusters of related documents) when creating the plot to avoid having to zoom on the main plot.

index_subset = abs(svd_ump$layout[,1]) <20 & abs(svd_ump$layout[,2]) <20
data_subset = svd_ump$layout[index_subset,]
raw_text_subset = raw_text[index_subset]
head_subset = head[index_subset]

fig <- plot_ly(type = 'scatter', mode = 'markers')
fig <- fig %>%
  add_trace(
    x = data_subset[,1],
    y = data_subset[,2],
    text = ~paste('heading:', head_subset ,"$<br>text: ", raw_text_subset ),
    hoverinfo = 'text',
    marker = list(color='green'),
    showlegend = F
  )

fig

After omitting the outliers, we see a nice plot that looks like it has some nice cluster separation.